MG205: Econometrics Theory and Applications

Topic 8: Exploiting Time Variation

José Ignacio González Rojas

London School of Economics and Political Science

February 2, 2026

Cross-Sectional Data Cannot Separate Heterogeneity from Treatment Effects

Panel Data Gives Us New Tools to Address Endogeneity

The problem

  • Endogeneity: \(\text{Cov}(x_{it}, e_{it}) \neq 0\)
  • Violation of Assumption 5 \(\Rightarrow\) no identification
  • OLS gives biased estimates of the parameters of interest
  • Cross-sectional data alone cannot fix this

Today

  • Assume a particular error structure: \(e_{it} = \alpha_i + u_{it}\)
  • With panel data, construct estimators invariant to \(\alpha_i\)

Following the same units over time enables new identification and estimation strategies.

Two Estimators Remove Unit-Level Unobserved Heterogeneity

First Differences and LSDV

First Differences (FD)

  • Population model: \(y_{it} = \beta x_{it} + \alpha_i + u_{it}\)
  • Subtract consecutive observations:

\[\Delta y_{it} = \beta \Delta x_{it} + \Delta u_{it}\]

  • \(\alpha_i - \alpha_i = 0\): unobserved heterogeneity disappears

Least Squares Dummy Variables (LSDV)

  • Include a dummy for each unit \(i\):

\[y_{it} = \beta x_{it} + \sum_{j=2}^{N} \gamma_j \mathbb{1}[i=j] + u_{it}\]

  • The dummies absorb \(\alpha_i\)
  • Equivalent to FD for \(T=2\) (we prove this later)

Exercise 1: Unobserved Heterogeneity Biases Cross-Sectional Estimates

Airline Fares Depend on Unobserved Route and Time Characteristics

Two Sources of Omitted Variable Bias

\[ \begin{aligned} \log(\text{fare})_{it} &= \beta_0 + \beta_1\log(\text{distance})_i + \beta_2\text{competition}_{it} + e_{it} \\ e_{it} &= \gamma_i + \delta_t + u_{it} \end{aligned} \]

  • \(\gamma_i\): route-specific, time-invariant unobserved heterogeneity
    • Business relationships
    • Airport amenities
  • \(\delta_t\): common time shocks
    • Fuel prices
    • Economic conditions
  • \(u_{it}\) is idiosyncratic error
  • We worry that \(\text{Cov}(\text{competition}_{it}, \gamma_i) \neq 0\)
    • Model not identified
    • OLS estimates are linear projections
    • \(\hat{\beta}\) might be biased estimates of the structural parameters \(\beta\)

Controlling for Distance and Year Dummies Does Not Remove Route Heterogeneity

The Proposed Model Falls Short

Estimated model

\[\begin{align*} \widehat{\log(\text{fare})}_{it} &= \hat{\beta}_{0} + \hat{\beta}_{1}\log(\text{distance})_{i} \\ &+ \hat{\beta}_{2}\text{competition}_{it} \\ &+ \hat{\delta}_{1}\mathbb{1}[t=2007] \\ &+ \hat{\delta}_{2}\mathbb{1}[t=2012] \end{align*}\]

What remains in the error?

  • Recall: \(e_{it} = \gamma_i + \delta_t + u_{it}\)
  • The year dummies address common time trends (\(\delta_t\))
  • \(\gamma_i\) remains in the error

Since \(\text{Cov}(\text{competition}_{it}, \gamma_i) \neq 0\), OLS is biased.

First-Differencing Eliminates Route-Level Unobserved Heterogeneity

The First-Difference Estimator

  • First-difference operator: \(\Delta x_{it} = x_{it} - x_{it-1}\)
  • Example: Take the USA–UK route. Subtract 2002 from 2007, and 2007 from 2012.

\[ \Delta\log(\text{fare})_{it} = \beta_2\Delta\text{competition}_{it} + \Delta\delta_t + \Delta u_{it} \]

\(\gamma_i - \gamma_i = 0\): time-invariant route characteristics disappear.

With year dummies for transition periods (2002–2007 base, 2007–2012):

\[ \widehat{\Delta\log(\text{fare})}_{it} = \hat\alpha + \hat\beta_2\Delta\text{competition}_{it} + \hat\delta\mathbb{1}[\text{transition } 2007-2012] \]

Combining FD with Year Dummies Addresses Both Sources

Two Strategies for Two-Way Heterogeneity

FD removes \(\gamma_i\) (unit FE)

  • Subtract consecutive observations
  • \(\gamma_i - \gamma_i = 0\)
  • Time-invariant variables also drop out: \(\Delta\log(\text{distance})_i = 0\)

Year dummies absorb \(\delta_t\) (time FE)

  • Include dummies for transition periods
  • Common time shocks captured
  • This is LSDV applied to time effects

(1) FD + year dummies, or (2) full LSDV with dummies for both units and time periods.

Rejecting the Null Does Not Validate the Model

The Trap

  • With robust \(t\)-statistics, we reject \(H_{0}: \beta_{\text{competition}} = 0\)
  • But the null assumes the model is correctly specified
  • If OVB remains (route-level heterogeneity not addressed), \(\hat{\beta}\) is biased
  • Statistical significance \(\neq\) valid causal interpretation
  • The estimate is a linear projection
  • FD reduces bias from time-invariant confounders but does not eliminate all sources

Exercise 2: Time Effects Capture Industry-Wide Patent Growth

Industry-Wide Patent Growth Requires Flexible Time Effects

Setting

  • 37 pharmaceutical firms
  • 2005-2007
  • No OVB concerns
    • Causal interpretation
  • Patents growing industry-wide
    • Regardless of individual firm R&D

Model

\[\begin{align*} \log(\text{patents})_{it} &= \beta_0 + \beta_1\log(\text{R\&D})_{it} \\ &+ \beta_2\mathbb{1}[t=2006] + \beta_3\mathbb{1}[t=2007] \\ &+ e_{it} \end{align*}\]

  • Could model the trend linearly or quadratically
  • Year dummies allow any form — nonparametric
  • \(\beta_{1}\): elasticity of patents w.r.t. R&D (causal)

Year Dummy Coefficients Measure Growth Rates

Conditional Expectations Are The Tool to Interpret

\[\begin{align*} \mathbb{E}[\log(\text{patents})_{it} \mid t=2005] &= \beta_0 + \beta_1\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2006] &= (\beta_0 + \beta_2) + \beta_1\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2007] &= (\beta_0 + \beta_3) + \beta_1\log(\text{R\&D})_{it} \end{align*}\]

  • Average log change in patents across all firms
  • Geometric mean growth rate of patents in the industry
    • \(\beta_2\): 2005 to 2006
    • \(\beta_3\): 2005 to 2007

Year dummies measure growth rates between periods — not “the level in 2006 vs 2005.”

Interactions Allow the R&D Elasticity to Vary Over Time

Heterogeneous Elasticities

\[ \begin{align*} \log(\text{patents})_{it} &= \beta_0 + \beta_1\log(\text{R\&D})_{it} + \beta_2\mathbb{1}[t=2006] + \beta_3\mathbb{1}[t=2007] \\ &+ \beta_4(\log(\text{R\&D})_{it} \times \mathbb{1}[t=2006]) + \beta_5(\log(\text{R\&D})_{it} \times \mathbb{1}[t=2007]) \\ &+ e_{it} \end{align*} \]

Conditional means by year

\[\begin{align*} \mathbb{E}[\log(\text{patents})_{it} \mid t=2005] &= \beta_0 + \beta_1\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2006] &= (\beta_0 + \beta_2) + (\beta_1 + \beta_4)\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2007] &= (\beta_0 + \beta_3) + (\beta_1 + \beta_5)\log(\text{R\&D})_{it} \end{align*}\]

Level Shifts and Slope Shifts Are Separately Identified

Decomposing Differential Effects

Year Intercept Elasticity
2005 \(\beta_0\) \(\beta_1\)
2006 \(\beta_0 + \beta_2\) \(\beta_1 + \beta_4\)
2007 \(\beta_0 + \beta_3\) \(\beta_1 + \beta_5\)

How the patents-R&D elasticity changes over time

  • \(\beta_4\): elasticity change 2005 \(\to\) 2006
  • \(\beta_5\): elasticity change 2005 \(\to\) 2007

No need to interpret \(\beta_4\) and \(\beta_5\) individually; the conditional means do the work.

Exercise 3: Empirical Models Interact All Relevant Variables

Gender Wage Gaps Changed After the Mining Boom

Three Patterns from the Data

  • Gender: Men earn a constant wage premium over women
  • Time trend: Wages grow over time for both groups
  • Structural break (2005): After the mining company arrives, the male premium widens

The Empirical Model Interacts Gender, Time, and Post-2005

Eight Coefficients for Four Groups

\[ \begin{align*} \log(\text{wages})_{it} &= \beta_0 + \beta_1\mathbb{1}[i\text{ is male}] + \beta_2 t + \beta_3\mathbb{1}[t \geq 2005] \\ &+ \beta_4(\mathbb{1}[i\text{ is male}] \times t) + \beta_5(\mathbb{1}[i\text{ is male}] \times \mathbb{1}[t \geq 2005]) \\ &+ \beta_6(t \times \mathbb{1}[t \geq 2005]) + \beta_7(t \times \mathbb{1}[t \geq 2005] \times \mathbb{1}[i\text{ is male}]) \\ &+ e_{it} \end{align*} \]

This model captures level differences, trends, and how both changed after 2005, separately for men and women.

Conditional Means: Pre-2005

Women and Men Before the Structural Break

Women before 2005 (base category)

\[ \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=0,t<2005] = \beta_0 + \beta_2 t \]

Men before 2005

\[ \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=1,t<2005] = (\beta_0 + \beta_1) + (\beta_2 + \beta_4)t \]

\(\beta_1\) shifts the intercept; \(\beta_4\) shifts the slope.

Conditional Means: Post-2005

Women and Men After the Structural Break

Women after 2005

\[ \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=0,\; t \geq 2005] = (\beta_0 + \beta_3) + (\beta_2 + \beta_6)t\]

Men after 2005

\[\begin{align*} \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=1,\; t \geq 2005] &= (\beta_0 + \beta_1 + \beta_3 + \beta_5) \\ &\quad + (\beta_2 + \beta_4 + \beta_6 + \beta_7)t \end{align*}\]

Each coefficient modifies either the intercept or slope for a specific group-period combination.

Taking Differences Isolates Each Coefficient’s Role

Condition on Group, Then Difference

Coefficient Signs Follow Directly from the Figure

Economic Interpretation

Positive (\(> 0\))

  • \(\beta_1\): male premium
  • \(\beta_2\): wages grow over time
  • \(\beta_7\): male wages grow faster post-2005

Zero (\(= 0\))

  • \(\beta_3\): no level break for women at 2005
  • \(\beta_4\): same pre-2005 growth rate
  • \(\beta_6\): female growth unchanged post-2005

Negative (\(< 0\))

  • \(\beta_5\): relative to the base group (women pre-2005), the intercept for men post-2005 is lower than what other coefficients predict

Exercise 4: Panel Data Enables Identification and Estimation

Panel Data Follows the Same Units Over Time

Definition and Structure

  • \(y_{it}\), \(x_{it}\) for \(i = 1, \ldots, N\) and \(t = 1, \ldots, T\)
  • Panel data: same units observed across multiple time periods
  • Cross-section: one \(t\) only
  • Repeated cross-section: same population, different individuals each period

Error decomposition

\[ e_{it} = \alpha_i + v_{it} \]

Panel Structure Enables Identification of Parameters

Identification vs Estimation

Identification

  • Could we recover unique values for each parameter?
    • Cross-section: \(\alpha_i\) in error, correlated with \(x_{it}\) \(\to\) cannot identify \(\beta\)
    • Panel: difference out \(\alpha_i\)
  • Requires: \(\mathbb{E}[v_{it} \mid x_{it}] = 0\)

Estimation

  • Given identification, how do we compute \(\hat\beta\) from the data?

\[ \Delta y_i = \delta + \beta_1\Delta x_{i1} + \cdots + \beta_k\Delta x_{ik} + \Delta v_i \]

  • Requires \(T \geq 2\) and within-unit variation

Example: cannot estimate returns to education via FD if education does not change over time.

Panel Data Reduces OVB but Cannot Eliminate All Sources of Bias

Solves

  • Time-invariant OVB (\(\alpha_i\))
  • Unit-invariant OVB (\(\lambda_t\))

Does not solve

  • Time-varying confounders (\(u_{it}\))
  • Measurement error
  • Selection bias

Less scope for OVB, but not zero.

Exercise 5: First-Differencing Amplifies Measurement Error

Education Is Measured with Error in Both Periods

True vs Observed Variables

\[ \text{education}_{it} = \text{education}^{*}_{it} + e_{it} \]

Assumptions

  • \(e_{it}\) uncorrelated with true education and other variables
  • Education varies little over time for adults

Cross-Sectional Attenuation Bias Shrinks the Coefficient Towards Zero

The Baseline Problem

Population model

\[ \log(\text{wage})_i = \alpha + \beta\text{education}^{*}_i + \epsilon_i \]

Observed model substitutes \(\text{education}_i = \text{education}^{*}_i + e_i\)

Attenuation bias

\[ \text{plim}\;\hat\beta = \beta \cdot \frac{\text{Var}(\text{educ}^*)}{\text{Var}(\text{educ}^*) + \text{Var}(e)} \]

The ratio is less than 1, so the coefficient is biased towards zero (derived in Topic 6).

First-Differencing Increases the Measurement Error Variance

The Panel Data Paradox

FD of observed education

\[ \Delta\text{education}_i = \Delta\text{education}^{*}_{i} + (e_{i2} - e_{i1}) \]

If \(e_{i1}\) and \(e_{i2}\) are uncorrelated

\[ \text{Var}(e_{i2} - e_{i1}) = \text{Var}(e_{i1}) + \text{Var}(e_{i2}) \]

Panel Data Involves a Fundamental Bias Trade-off

FD Eliminates Fixed Confounders but Amplifies Measurement Error

\[ \text{plim}\;\hat\beta_{\text{FD}} = \beta \cdot \frac{\text{Var}(\Delta\text{educ}^*)}{\text{Var}(\Delta\text{educ}^*) + \text{Var}(e_{i1}) + \text{Var}(e_{i2})} \]

Benefits

  • Eliminates \(\alpha_i\), reduces OVB from time-invariant confounders
  • Enables causal identification under strict exogeneity

Costs

  • Measurement error variance grows in denominator
  • Numerator small if \(x\) changes little over time

Attenuation ratio is smaller — more severe bias towards zero.

Exercise 6: Panel Data Cannot Solve Selection Bias

Roommate Nationality May Affect Student Grades

Self-Selection into Rooms Creates Endogeneity

Let \(\text{same}_{it} = \mathbb{1}[i\text{ has same-nationality roommate in } t]\)

\[ \text{grades}_{it} = \alpha + \beta\;\text{same}_{it} + e_{it} \]

  • Exogeneity holds under random assignment of roommates
  • If students choose roommates: an omitted equation determines room selection
  • More outgoing students may prefer different nationalities and perform differently academically
  • \(\text{Cov}(\text{same}_{it}, e_{it}) \neq 0\) arises from selection, not unobserved heterogeneity

Two Years of Data Require Roommate Changes

Panel Structure and Exogenous Mobility

Year 1

\(\text{grades}_{i1} = \alpha + \beta\;\text{same}_{i1} + \alpha_i + u_{i1}\)

Year 2

\(\text{grades}_{i2} = \alpha + \beta\;\text{same}_{i2} + \alpha_i + u_{i2}\)

First-differencing

\(\Delta\text{grades}_i = \beta\;\Delta\text{same}_i + \Delta u_i\)

  • Critical: students must change roommates (\(\Delta\text{same}_i \neq 0\) for some)
  • Exogenous mobility design — plausible if the university reassigns rooms

The Problem Is Selection, Not Unobserved Heterogeneity

What Panel Data Cannot Fix

  • FD removes \(\alpha_i\) (unobserved ability) — but the core problem is selection into rooms
  • If reasons for changing roommates correlate with grade changes, FD does not help
  • Panel data addresses unobserved heterogeneity (Exercises 1, 4, 5)
  • It does not address selection bias

Random Assignment Plus Panel Data Strengthens Identification

You Get What You Pay For

  • Random assignment of roommates:
    • \(\beta\) is identified even in cross-section
    • Panel adds precision: removes \(\alpha_i\) from the error \(\to\) smaller variance \(\to\) smaller standard errors
  • Potential selection: panel data alone cannot solve the selection problem

Panel Data Enables Invariant Estimators to Unobserved Heterogeneity

Summary

  1. FD and LSDV address different components of unobserved heterogeneity
  2. Time effects capture growth rates:
  3. Interactions capture heterogeneous effects
  4. Measurement error is amplified by FD - a fundamental trade-off
  5. Panel data cannot solve selection bias

Next Week: Topic 8 Part II

  • Clustered standard errors in panel data
  • Linear time trends vs time fixed effects
  • Age effects on earnings
  • Selection vs incentive effects in performance pay
  • Multiple fixed effects and F-tests

Appendix: Detailed Derivations

First-Difference Derivation for General Panel Model

Write the model for \(t = 1\) and \(t = 2\):

\[ \begin{align*} y_{i1} &= \beta_0 + \beta_1 x_{i1,1} + \cdots + \beta_k x_{i1,k} + a_i + v_{i1} \\ y_{i2} &= (\beta_0 + \delta) + \beta_1 x_{i2,1} + \cdots + \beta_k x_{i2,k} + a_i + v_{i2} \end{align*} \]

Subtract: \(a_i - a_i = 0\).

\[ \Delta y_i = \delta + \beta_1\Delta x_{i1} + \beta_2\Delta x_{i2} + \cdots + \beta_k\Delta x_{ik} + \Delta v_i \]

The time-invariant component \(a_i\) has been eliminated. OLS on this equation yields consistent estimates under \(\mathbb{E}[\Delta v_i \mid \Delta x_{i1}, \ldots, \Delta x_{ik}] = 0\).

Return to main

Attenuation Factor: Cross-Section vs First-Differences

Cross-section

\[ \text{plim}\hat\beta_{\text{CS}} = \beta \cdot \frac{\text{Var}(\text{educ}^*)}{\text{Var}(\text{educ}^*) + \text{Var}(e)} \]

First-differences (assuming uncorrelated measurement errors)

\[ \text{plim}\;\hat\beta_{\text{FD}} = \beta \cdot \frac{\text{Var}(\Delta\text{educ}^*)}{\text{Var}(\Delta\text{educ}^*) + \text{Var}(e_{i1}) + \text{Var}(e_{i2})} \]

  • Since education changes little over time:
    • \(\text{Var}(\Delta\text{educ}^*) \ll \text{Var}(\text{educ}^*)\).
    • FD \(<\) CS attenuation ratio, and the bias towards zero is worse in FD

Return to main